Fully-online construction of suffix trees and DAWGs for multiple texts

نویسندگان

  • Takuya Takagi
  • Shunsuke Inenaga
  • Hiroki Arimura
چکیده

We consider fully-online construction of indexing data structures for multiple texts. Let T = {T1, . . . , TK} be a collection of texts. By fully-online, we mean that a new character can be appended to any text in T at any time. This is a natural generalization of semi-online construction of indexing data structures for multiple texts in which, after a new character is appended to the kth text Tk, then its previous texts T1, . . . , Tk−1 will remain static. Our fully-online scenario arises when we index multi-sensor data. We propose fully-online algorithms which construct the directed acyclic word graph (DAWG) and the generalized suffix tree (GST ) for T in O(N log σ) time and O(N) space, where N and σ denote the total length of texts in T and the alphabet size, respectively.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fully-online Construction of Suffix Trees for Multiple Texts

We consider fully-online construction of indexing data structures for multiple texts. Let T = {T1, . . . , TK} be a collection of texts. By fully-online, we mean that a new character can be appended to any text in T at any time. This is a natural generalization of semi-online construction of indexing data structures for multiple texts in which, after a new character is appended to the kth text ...

متن کامل

Bidirectional Construction of Suffix Trees

String matching is critical in information retrieval since in many cases information is stored and manipulated as strings. Constructing and utilizing a suitable data structure for a text string, we can solve the string matching problem efficiently. Such a structure is called an index structure. Suffix trees are certainly the most widely-known and extensively-studied structure of this kind. In t...

متن کامل

Sparse Directed Acyclic Word Graphs

The suffix tree of string w is a text indexing structure that represents all suffixes ofw. A sparse suffix tree ofw represents only a subset of suffixes of w. An application to sparse suffix trees is composite pattern discovery from biological sequences. In this paper, we introduce a new data structure named sparse directed acyclic word graphs (SDAWGs), which are a sparse text indexing version ...

متن کامل

On – line construction of suffix trees 1

An on–line algorithm is presented for constructing the suffix tree for a given string in time linear in the length of the string. The new algorithm has the desirable property of processing the string symbol by symbol from left to right. It has always the suffix tree for the scanned part of the string ready. The method is developed as a linear–time version of a very simple algorithm for (quadrat...

متن کامل

String Processing Algorithms

The thesis describes extensive studies on various algorithms for efficient string processing. Data available in/via computers are often of enormous size, and thus, it is significantly important and necessary to invent timeand space-efficient methods to process them. Most of such data are, in fact, stored and manipulated as strings. String matching is most fundamental in string processing, where...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1507.07622  شماره 

صفحات  -

تاریخ انتشار 2015